Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata Programming: add and remove variables from `varlist' local macro

    Hi everyone,

    I'm writing a program which takes binary, categorical, and continuous variables in varlist. However, since the program I'm writing will need all k-level categorical variables converted to k-1 dummy variables, I'm currently trying to add this functionality to the program. Where I'm getting stuck is: I want the original categorical variable name removed from varlist and the new dummy variable names added to varlist. (I may eventually change it to have the user only enter the categorical variables in the option, to remove the step of having to remove it from varlist, but want to make it as general as possible right now.) Eventually, `varlist' will be passed to a Mata function for more processing. Below I've reproduced part of the Stata program:

    Code:
    capture program drop testing
    program define testing
        version 14
        syntax varlist(min=1 numeric) [if] [in], [categorical(varlist)]
        
        marksample touse
        
        /* categorical variables */
        if "`categorical'"!="" {
            local ncatvars: word count `categorical'
            tokenize `categorical'
            
            forval i=1/`ncatvars' {
                /* generate dummy variables */
                tabulate ``i'', gen(``i''_)
                /* drop one of the dummy variables */
                drop ``i''_1
                /* count number of dummy variables */
                local ``i''_num=r(r) - 1
                forval j=2/```i''_num' {
                        /* code will go here to add the dummy variables to `varlist' */
                }
                /* code here to remove original categorical variables from varlist */
            }
        }
        
    di "Varlist contains: `varlist'"
    end
    
    testing var1 var10, categorical(var10)
    Code:
    . testing var1 var10, categorical(var10)
    
          var10 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          4       22.22       22.22
              2 |          3       16.67       38.89
              3 |          4       22.22       61.11
              4 |          7       38.89      100.00
    ------------+-----------------------------------
          Total |         18      100.00
    Varlist contains: var1 var10
    As you can see, `varlist' still contains var1 and var10. Once I figure out the code, what I want it to contain is var1 var10_2 var10_3 var10_4. Thank you in advance for your help. I'm using Stata 14.

    John

  • #2
    I would approach this problem differently: I would allow users to specify factor variables. That way they can use standard notation for Stata to identify which variables are categorical and which are not, and the indicator (dummy) variables are made by Stata.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi Maarten,

      Thank you for the response. One other option we had considered is requiring the user to create the dummy variables themselves. So, making it clear in the documentation and maybe even adding a warning in the program itself.

      Anyone have a programming solution to the question posed? Otherwise, I may just implement Maarten's suggestion or require the user to input dummy variables themselves.

      Thank you!
      John

      Comment


      • #4
        Hello John,

        Try treating varlist and categorical as macro lists, which they are.

        So, your solution would be something along the lines of the following:
        Code:
        local varlist : list varlist-categorical
        See "[P] macro lists" for more info.

        Comment


        • #5
          Just what I needed! Thank you, Roger.

          Here's my final code fragment:

          Code:
          capture program drop testing
          program define testing
              version 14
              syntax varlist(min=1 numeric) [if] [in], [categorical(varlist)]
              
              marksample touse
              
              
              di "varlist contains: `varlist'"
              
              /* categorical variables */
              if "`categorical'"!="" {
                  local ncatvars: word count `categorical'
                  tokenize `categorical'
                  
                  forval i=1/`ncatvars' {
                      /* generate dummy variables */
                      tabulate ``i'', gen(``i''_)
                      /* drop one of the dummy variables */
                      drop ``i''_1
                      /* count number of dummy variables */
                      local ``i''_num=r(r) - 1
                      forval j=1/```i''_num' {
                          local k=`j'+1
                          local add "``i''_`k'"
                          /* code that adds the dummy variables to `varlist' */
                          local varlist : list varlist | add
                      }
                      /* code that removes categorical variables from varlist */
                      local varlist : list varlist-categorical
                  }
              }
              
          di "Varlist contains: `varlist'"
          end
          
          testing var1 var10, categorical(var10)
          Code:
          . testing var1 var10, categorical(var10)
          varlist contains: var1 var10
          
                var10 |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |          4       22.22       22.22
                    2 |          3       16.67       38.89
                    3 |          4       22.22       61.11
                    4 |          7       38.89      100.00
          ------------+-----------------------------------
                Total |         18      100.00
          Varlist contains: var1 var10_2 var10_3 var10_4

          Comment


          • #6
            What happens if a categorical variable has only one category? Your code will ignore such variables. Is this what you want?

            Best
            Daniel

            Comment


            • #7
              Not the main question and I guess it's on the to do list, but I note that you create a temporary variable with marksample and then never use it.

              Comment


              • #8
                In answer to the two questions above:

                What happens if a categorical variable has only one category? Your code will ignore such variables. Is this what you want?

                Best
                Daniel
                Edited because I misunderstood the question. See my follow up response in the next post below.

                Not the main question and I guess it's on the to do list, but I note that you create a temporary variable with marksample and then never use it.
                Yes, it is extraneous to this code fragment, but in the main larger program it will eventually be used.


                Thanks everyone for your help!
                John
                Last edited by John Gallis; 21 Feb 2017, 11:35.

                Comment


                • #9
                  What happens if a categorical variable has only one category? Your code will ignore such variables. Is this what you want?

                  Best
                  Daniel
                  For the particular function I'm writing we don't want a categorical variable which has no variation (i.e., only one category). Thank you for pointing out this detail I overlooked (about what happens with categorical variables with only one level).

                  Edit: I think I will add a warning message or logic that will give an error if there is only one level, as for the particular application I'm implementing, it doesn't make sense to include categorical variables with no variation. The code removes the variable entirely if it's placed in the categorical() option, but I think it will be good to let the user know what's happening by giving a warning or error.
                  Last edited by John Gallis; 21 Feb 2017, 11:37.

                  Comment


                  • #10
                    If the aim is to build a matrix of covariates which is to be passed to Mata later and where some of them are categorical, I'll definitely go for the factor variables syntax (as suggested by Maarten).

                    Code:
                    syntax varlist(fv) ...
                    Use fvexpand if necessary and I guess _rmcoll can check for multicollinearity with factor variables

                    In mata you can write for example


                    Code:
                    X = st_data(.,"i.x1 x2 ibn.x3","`touse'")
                    Here is another mickey mouse example

                    Code:
                    clear
                    
                    set obs 10
                    gen x = floor(5*runiform()) // I have version 12, where runiformint() is not available
                    list
                    
                    mata: st_data(.,"ibn.x")

                    Code:
                    clear
                    
                    .
                    . set obs 10
                    obs was 0, now 10
                    
                    . gen x = floor(5*runiform())
                    
                    . list
                    
                         +---+
                         | x |
                         |---|
                      1. | 1 |
                      2. | 0 |
                      3. | 2 |
                      4. | 4 |
                      5. | 2 |
                         |---|
                      6. | 4 |
                      7. | 1 |
                      8. | 2 |
                      9. | 1 |
                     10. | 4 |
                         +---+
                    
                    .
                    . mata: st_data(.,"ibn.x")
                            1   2   3   4
                         +-----------------+
                       1 |  0   1   0   0  |
                       2 |  1   0   0   0  |
                       3 |  0   0   1   0  |
                       4 |  0   0   0   1  |
                       5 |  0   0   1   0  |
                       6 |  0   0   0   1  |
                       7 |  0   1   0   0  |
                       8 |  0   0   1   0  |
                       9 |  0   1   0   0  |
                      10 |  0   0   0   1  |
                         +-----------------+
                    Last edited by Christophe Kolodziejczyk; 21 Feb 2017, 12:18.

                    Comment

                    Working...
                    X